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butions from redundant and noisy data is considered. A strategy is proposed, which 
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evolves through the following steps: i)independent constraints are first pre-selected 



by recourse to a data-independent technique to be discussed here. ii)the data are 
a posteriori used to determine the parameters of the distribution by a previously 
introduced forward approach, iii) A backward approach is proposed for reducing 
the parameters of such distribution. The previously introduced forward approach is 
generalised here in order to make it suitable for dealing with very noisy data. 
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I. INTRODUCTION 



Among the generalised non-extensive MaxEnt distributions, which are defined in terms of a 
parameter q [1-3] the corresponding to the value q — 1/2 has played a particular role in diverse 
contexts [4-9]. 

In this paper we focus on developing strategies for constructing the q — 1/2 distribution which is 
involved in a very special type of inverse problem: the problem of constructing such a distribution on 
the basis of redundant and noisy data (by noise we mean errors resulting from the random process 
associated to the experimental measurement procedure). 

It is appropriate to start by discussing why we shall restrict consideration to the particular value 
q = 1/2. 

The problem of determining a p q probability distribution maximising the entropy 

y^ q _ t n 

>Jq — ; 

1 - q 

with constraints 

N 



f° = Y,P q nhn ; i = l,...,M 
n=l 

N 

i = E?5 



has been shown in [6] to be numerically equivalent to determining the probability distribution p 
minimising 

i N 
l|p|l! = £p^ 



n=l 



with constrains 



N 

f?=T,Pnfi,n S i=l,...,M. 



n=l 
N 



n=l 



Since p n > it is true that | \p\ \ i is the --norm of p. Thus, the problem of choosing the parameter q is 
equivalent to deciding which norm one wants to minimise as preserving the 1-norm of the distribution. 
In order to analyse the situation further let us joint all constrains together by defining a (M+ 1) x iV 
matrix A of elements A i)U = f ijTl \ ; % — 1, . . . , M; ; n — 1, . . . , N and Am+i, u — 1 ! n — 1, . . . , N. 
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Hence, the constraints are expressed in the form 

r = a p , 

where f° is a vector of (M + 1) components f°, . . . , f^, 1. It is well know from linear algebra that 
the general solution to this under-determined linear system can be expressed as 

p = A rl f° + p' 

where A' is the pseudo inverse of A, and p' a vector in the null space of matrix A. Consequently, 
the problem of deciding on the g-parameter is tantaumont to just choosing a vector p' in the null 
space of A. In particular, the choice q — 1/2 (which as already discussed is equivalent to minimising 
the 2-norm of the p distribution) implies to set p' = 0. This follows from the fact that, since vector 
Ar x ]° and vector p' are orthogonal with each other one has 

ibii^ii^rii^+ibiir 

Hence, by setting p' = the solution of minimum 2-norm is obtained. For a number of reasons, 
that we spell out below, we believe that this leads to the most suitable choice for the parameter q in 
relation to our problem. Indeed, 

• The under-determined problem we have to solve is of the following especial nature: We have 
less independent equation than unknowns, but there is a large number of redundant equations 
and a number of irrelevant ones [7] . If the data were noiseless, the role of such equations would 
be simply to verify the ability of the distribution to make correct predictions. Since the data 
are noisy we use all the equations with the purpose of reducing the effect of the noise, but not 
as independent constraints (in most cases the number of Lagrange multiplies is much less that 
the actual number of available constraints). Our task is to identify a subset of such independent 
constraints. The predictive power of our solution is assessed a posteriori by its capability of 
predicting the denoised data. 

• The constraints typically represent measurements obtained as a function of some variable pa- 
rameters: Intensity vs. diffraction angle, magnetisation vs. magnetic field etc. [12,13]. It is 
then natural to represent such measurements as linear functionals on the identical vector. Each 
linear functional provides a projection on the particular parameter value which is specified by 

3 



the measurement instrument state [12]. It is clear then that in the space of the data it is appro- 
priate to define a distance through the norm induced by the inner product. In our formalism 
both the space of the data and the space of the system are assumed to be Hilbert spaces. The 
only 1/g-norm induced by a Hilbert space is the one corresponding to q = 1/2. 

• As mentioned above, to choose a value of q other than q — 1/2 would imply to let the cor- 
responding distribution have a component in the null space of the transformation generated 
by the constraints. In the type of problem described in the previous item such a null space is 
of a 'chaotic' nature (in the sense that arbitrarily small numerical perturbation on any of the 
elements of matrix A would produce and enormous distortion in the solution). We certainly 
wish to avoid this. 

Unfortunately, in our context deciding on the appropriate g-value of the distribution we wish to 
construct does not solve the problem of its optimal construction. While it is true that the problem 
of determining the q = 1/2 distribution from a fixed set of constraints is a simple linear problem [5], 
the problem becomes highly non linear when this distribution is to be determined optimally from a 
subset of constraints which are taken out of a much larger set of possible ones. 

Consider that from a set of M constraints we want to select a subset of k ones and associate a 
parameter (Lagrange multipliers) to each equation. Let us indicate asp^W the distribution associated 
to the corresponding k equations. Hence the problems we have to face are the following a) the 
selection of the optimal k constraints b) the estimation of the corresponding k parameters determining 
the distribution. In order to address these problems let us specify the meaning of 'optimal selection' 
in our context: we say that a selection is optimal if it yields a distribution capable of satisfactorily 
predicting all the available data involving the minimum number of parameters. Unfortunately the 
search for such an optimal selection is not in general possible, as it poses a NP-hard problem, i.e., 
unreachable in polynomial time with classical computers [10,11]. Hence we are forced to ascertain 
suitable suboptimal strategies, which also poses an open problem because there is not a unique way 
of constructing suboptimal solutions. 

In some recent publications we have introduced a suboptimal iterative strategy, which is only 
optimal at each iteration step [7,8]. Such an approach is a forward data dependent approach for 
subset selection. At each iteration the indices obtained in the previous steps are fixed, and a new 
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index is chosen in such a way that the distance between the observed data and the ones predicted 
by the physical model is minimised. Since the selection is only optimal at each step, the selected set 
of indices is, of course, not optimal in the above specified sense. Some indices that are relevant at a 
particular step may become much less relevant at the end of the process. It is then natural to try and 
eliminate the parameters corresponding to such indices. Again, the process of reducing parameters in 
an optimal way is in general an NP-problem and we need to address it by suboptimal strategies. Here 
we propose a strategy for reducing parameters that we call backward selection. This new approach 
provides both the criterion for selecting the parameters to be deleted and the technique for properly 
modifying the ones to be retained. An approach for selecting independent constraints in the absence 
of data will also be advanced here, with the aim of designing a new suboptimal strategy consisting 
of the following steps: 

i) Before the experiment is carried out we select a subset of indices corresponding to independent 
constrains. 

ii) The forward selection approach proposed in [8] is then applied for selecting indices, from the 
pre-selected set, in order to construct the distribution when the data are available. 

iii) Finally the backward selection approach is applied in order to reduce further the number of 
parameter of the distribution. Such backward selection is made possible in a fast an efficient way by 
means of a backward adaptive biorthogonalization technique. 

Before advancing the above described new strategy we would like to discuss how is possible to 
adapt the strategy of [8] so as to make it suitable when dealing with very noisy data. This is achieved 
by introducing a vectorial space with inner product defined with respect to a measure depending on 
the experimental data, or their corresponding statistics. 

The paper is organised as follows: The generalisation of the previous approach, to turn it suitable 
when dealing with very noisy data, is introduced in section II. Section III discusses the criteria for 
selecting relevant constraints. First the selection criterion proposed in [7] is generalised and a numer- 
ical experiment is presented in order to illustrate the advantage of such a generalisation. We then 
discuss a new data independent selection criterion. In section IV we introduce a backward procedure 
for eliminating constraints and, consequently, for properly adapting the concomitant parameters of 
the distribution. Sections III and IV provide the foundations of a new strategy that we illustrate by 
a numerical example in Section IV. The conclusions are drawn in section V. 
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II. GENERALISING THE PREVIOUS APPROACH 



Let us assume that we are given M pieces of data f°, /£,...,/?,... each of which is the 

expectation value of a random variable that takes values f i>n ; n — 1, . . . , N according to the q — 1/2 

i 

probability distribution pi ; n = 1, . . . , N [7,8] i.e., 

n = Y.vlkn ; 1 = 1,. ..,M. (1) 

n=l 

The data f°, f% , . . . , /?, ■ ■ ■ Jm will be represented as components of a vector in a vector space, 
say £> M . A central aim of this contribution is to allow for the possibility of assigning a different 
weight to each data. Accordingly, the inner product in V M , that we indicate as ^(.|.)^, is defined 
with respect to a measure \i{m) as follows: For every / and g in V M 

M 

nif\9)n = J2Ji9iVi (2) 
i=i 

where f i indicates the complex conjugate of /. In the present situation we deal with real vectors, 
thereby, f i = /j. The data space, with the corresponding associated measure, will be denoted as 
V M {jj) and the standard orthogonal basis in V M (fx) will be represented by vectors ; i = \,...,M. 
The identity operator in V M {jj) is thus expressed as: 

M 



^El^^^l- ( 3 ) 

i=i 

with vectors i = 1, . . . , M satisfying the relations 

to = kj ( or if /ii = 0). (4) 

Accordingly, vector \f°) IJt is expressed 

M M 

lf% = E \i)^iMf°), = E * #10/.- ( 5 ) 

i=l i=l 

The measure /x, rendering a weighted distance between two vectors in V M (fi), will be chosen in 
relation to the observed data. For example, if the variances of the data are known and we denote by 
of the variance of data /°, the choice /i; = <r~ 2 , gives rise to the square distance between and 
\g)^e V M (n) as given by: 

M i 

% - l<?)„H 2 = ,(/° -g\f°- g), = E(/° - a) 2 -- (6) 

i=l °i 



The above distance is known to be optimal, in a maximum likelihood sense, if the data errors are 
Gaussian distributed [14]. 

The space of the physical system is considered to be the Euclidean iV-dimensional real space 1Z N . 
The standard orthogonal basis in 1Z N will be indicated by vectors \n) ; n — 1, . . . , N, so that every 
vector |r) G 1Z N is represented as: 



N N 



\ r ) = J2(n\r)\n) = J2 r n\n). (7) 



n=l n=l 

N 



For any two vectors \v) and |r) in 1Z the inner product is defined as: 

N N 

( v \r) = J2(v\n)(n\r) = J2 v nr n - (8) 

n=l n=l 

Using the adopted vector notation, equations (1) are recast: 

\n»=Mp h ) (9) 

with 

N i*i 
\ P 2) = ^2\n)(n\p^) = Y,Pn\n) (10) 

n=l n=l 

and operator : 72.* — > V M (fj.) given by 



^ = EI/»»I- ( n ) 

n=l 

Vectors |/„) M G 7J> M (/i) are defined in such a way that ^(i\f n )^ = fi,n, i.e., 

M M 
\fn)» = E \i>)nVil*(i\fn)li = E Mt/i,»|«>/x- ( 12 ) 



i=l i=l 
i , 



In the line of [7], in order to determine the MaxEnt |pa ) distribution we consider as constraint of the 
optimisation precess a subset of k equations (1) labelled by indices lj ; j = 1, . . . , k. This leads to 
the following expression for the distribution: 

life N k 

\P^ (k) ) = (T7 -TfEMi)* M*%) E \n) + E A %), M*)r (13) 



with 



■N N j=1 ^ j=1 



N N 

\g)» = T,\fn)» = T, A M- (14) 

n=l n=l 



The superscript k in |pa given above indicates that the distribution is built out of k constraints. 
The Lagrange multiplier vector |A^) is determined by the requirement that |p^( fc )) predicts a com- 
plete data vector \ f p )^ = A^p^) e V M {n) minimising the distance to the observed vector 1/°)^. 
This is actually the prescription given in [7]. Nevertheless, the fact that here the distance is defined 
with respect to a measure, which we propose to be dependent on the experimental data, implies 
that the formalism of [7] needs to be adapted to this requirement. In subsequent sections we discuss 
how this can be achieved in an straightforward manner by means of a recursive biorthogonalization 
technique for computing the Lagrange multipliers which determine |p^( fe )). 



A. Determination of Lagrange multipliers 

In order to estimate the Lagrange multipliers determining (13) we minimise the distance between 
the prediction through the physical model and observed data. As discussed in [7,8] this entails to 
determine the Lagrange multipliers as 

E WtiM&X = h\x k X = Ad/% (is) 

where we have denoted: Ft = Z)jLi \ a ij)n(lj\i with 

N l 

K> = E \fn)^{fn\ij) - ir r \g)w(g\lj)- (16) 

n=l JV 

Vector \ f°)n is obtained from the data vector as \f°)n = \f°)^~^f- and Py h is the orthogonal projector 
onto the subspace spanned by Icty-)^ ; j — 1, . . . , k. Here we wish this projector to account for the 
different weights of the data. This will be achieved by recurse to a biorthogonalization technique [15] 
which, as applied in this context, produces biorthogonal vectors dependent on the weight assigned 
to each data. 

Given a set of vectors |o;j„) M ; n — 1, . . . , M we set iV'h)/* = \ a i)v an d inductively define vectors 
$k+i)n as 

'^-ra (17) 

with 

|^fe+i>M = K+i)m ~ Pv k \ui k+1 ) (18) 



The dual vectors M (aif n +1 | ; n — 1, . . . , k + 1 which are obtained from the recursive equations 



I =»(®ln I ~ Mln K+lWMfc+1 1 5 W = 1 , . . . , 

/nfc +1 |- ^ fc+1 l - I J. I fig) 

satisfy the following properties 

• a) are biorthogonal with respect to vectors \an n )n ; n — 1, . . . , k + 1, i.e., 

^(«t +1 |«iJM = i,i» 5 n = l,...,/c + l ; m= 1,..., A; + 1 (20) 

• b) provide a representation of the orthogonal projection operator onto \4+i as given by: 

p Vk+1 = E l^>„R +1 l = = E I«L +1 )m,(^J- (2i) 

ra=l n=l 

The proof of a) and b) parallels that of [15,16], for the case of the standard Euclidean measure. 
It follows from (21) and (15) that the Lagrange multipliers yielding \ph^ k+l ^>) are obtained according 
to the recursive relation 

»(l n \\ {k+1) )» = (Q\ (k) )» - ,K\a lk+1 )^(l k+1 \X^), ; n— l,...,k 
Ah + i\\ {k+l) )» = »{k + i\f°)», (22) 

In writing down the above equations we confidently assume that the indices l n ; n — 1, . . . , k + 1 
are given to us. Of course, we must choose them somehow. How? The question does not possess a 
unique suitable answer, though. We tackle this problem below. 



III. SELECTION OF INDICES 

The problem of deciding on the indices l n ; n — 1, . . . , k to be considered in the construction of 
the |p^( fc )) distribution is far from be a simple one. One would like, of course, to choose the smallest 
set of indices allowing to minimise the distance between the observed vector and the physical model. 
Unfortunately, as already mentioned the search for a global minimum is an NP-hard problem in 
most cases. A sensible simplification is obtained by resigning the goal of global minimisation and 
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accepting a less ambitious suboptimal solution which arises from the following iterative procedure: 
At each iteration the indices obtained in the previous steps are fixed, and a new index is chosen so 
as to minimise the distance between the data vector and the vector predicted by the physical model. 
This is basically the strategy of the forward selection approach proposed in [7,8]. Such strategy, 
useful indeed in many situations, is just one among the many possible suboptimal strategies that one 
can envisage. Here we advance a new approach which is built out of two main ingredients: i)A data 
independent technique for selecting constraints to be discussed in section III B, and ii) A backward 
selection approach for reducing the number of parameters of a given distribution. To address the 
latter we need a technique evolving in the reverse direction with respect the forward technique of 
[7,8]. In this case the two challenges we have to face are: a)The one of deciding on the parameters to 
be eliminated b)The one of appropiertely modifying the parameters one wishes to retain. These two 
points are addressed in section III C by recurse to a backward birthogonalization approach. Before 
advancing the new strategy we would like to illustrate how the forward selection approach of [7,8], 
can be adapted in an straightforward manner in order to make it suitable when dealing with very 
noisy data. This is the subject of the section III A. 



A. Data dependent selection criterion 

As proposed in [7,8] a set of sub- indices l n ; n — 1, . . . , k + 1 can be iteratively determined by 
selecting, at iteration k + 1, the index lk+i corresponding to a vector \ai k )p (Cf. Eq.(16)) that 
minimises the norm of the residual resulting when approximating the observed data by the physical 
model. This process is tantamount to selecting the index 4+i that maximizes the functionals [7]: 

e„= im~f°)»\ 2 ; n = l,...,M, (23) 
with \^ n )„ = iil^ii and |^ n )„ = - iV fc KV 

At this point, we would like to illustrate the advantage of allowing different weights for each data. 
We use the same example as in [7] i.e., the data are generated as: 

50 

f° = Y,Pnhn + e t ; * = 1,. ..,100, (24) 

n=l 

with p n represented by the continuous line of Figure 1, and f^ n = exp(-nxi) ; Xi = 0.01 * i ; i — 
1, . . . , 100 ; n — 1, ... ,50. This is an extremely bad conditioned problem. In order to have a good 
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approximation of the distribution of Figure 1, it was assumed in [7] that we know the data within an 
uncertainty of 0.1%. Here we consider the errors to be much larger. Each data is distorted by a zero 
mean Gaussian distributed random variable of variance of corresponding to 20% of the data value. If, 
as in [7], we consider an uniform measure (// = 1) the approximation we obtain is represented by the 
dotted lines of Figure la (for 2 different realizations of the data). As we clearly gather from Figure 
lb, by considering a nonuniform measure given as /ij = a~ 2 ; i — 1, . . . , 100 the approximation is 
enormously improved and becomes stable against different realization of the data. 



B. Data independent selection criterion 

This alternative criterion for selecting indices is independent on the actual data. It is meant to 
speed up the posterior selection process and is grounded on the fact that redundant equations arise as 
a consequence of physical model. Hence, redundancy can be detected without the actual realization 
of the experimental measurements. In our formalism each constraint, say the 4-one is associated to 
a vector \ai k )^. Hence the problem of discriminating linearly independent constraints is equivalent 
to the problem of discriminating linearly independent vectors. We address this problem by recourse 
to a recently introduce technique [17], which allows for a hierarchical selection giving rise to a stable 
inverse problem. The goal is achieved by selecting, at each step, the index 4 maximising the ratios: 

r " = f^rC 5 n = l...,M. (25) 

1 1 |^W/x| I 

This data independent technique for eliminating redundancy makes the posterior data processing 
much faster, as the selection of indices for constructing the distribution can be carried out only on 
those indices rendering independent vectors. There is also room for different post-processing strate- 
gies because, specially when the data are very noisy, the number of required Lagrange multipliers 
happens to be smaller than the number of indices rendering 'numerical independence'. One possibil- 
ity is to apply the selection criterion discussed in the previous section, but only on the preselected 
indices. Additional reduction of Lagrange Multipliers is made possible by a backward strategy to be 
introduced in the next section. 
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C. Reducing Lagrange Multipliers 



As already discussed, the fact that Lagrange Multipliers are associated to constraints that are 
selected on a step by step basis implies that at the end of the selection precess some Lagrange 
Multipliers may have diminished relevance. To be in a position to eliminate Lagrange Multiplier of 
little relevance we need to develop an appropriate technique. 

Consider that we wish to reduce the number k of Lagrange multipliers characterising a \p^ k ^>) 
distribution. Even if we know which particular parameters should be disregarded the actual process of 
removing them yields a non-linear problem. The non-linearity follows from (15) where the Lagrange 
multipliers in the left hand side of the equations are the coefficients of a linear superposition of 
non-orthogonal vectors. The right hand side indicates that such a superposition is the orthogonal 
projection of the vector \f°) tl onto the subspace generated by vectors 1%)^ ; j — 1, . . . , k. Thus, 
within the framework of this Communication, the decision of eliminating some Lagrange multipliers 
comes along with the aim of leaving the vector orthogonal projection onto the reduced subspace. 
This entails that we must recalculate the remaining Lagrange multipliers. The need for recalculating 
coefficients of a non-orthogonal linear expansion, when eliminating some others, is discussed in [18] 
where a backward biorthogonalization approach is advanced. Such a technique, that we describe 
next, has been devised in order to modify biorthogonal vectors so as to appropriately represent the 
orthogonal projector onto a reduced subspace. 

Let us recall that 14 = spanda^)^, . . . , az fe )^} and let Vk/ ai . denote the subspace which is left by 
removing the vector |o;j )^ from i.e, 

V k / ai . = span{|ai h ) M , . . . , h-.^, . . . , K)4. (26) 

We have already discussed how to construct the orthogonal projector onto V k (Cf. Eq. (21)). 
In order to represent the orthogonal projector onto the reduced subspace Vfc/ a; the corresponding 
biorthogonal vectors \c^ n )^ need to be modified as established by the following theorem. 
Theorem 1: Given a set of vectors ; n = 1, . . . , k biorthogonal to vectors \ai n )^ ; n = 1, . . . , k 

and yielding a representation of Py k as given in (21), a new set of biorthogonal vectors \^J) iX ; n = 
1, . . . , j — 1, j + 1, . . . , k yielding a representation of Py k/a as given by 

k k 

Pv„ /ai . = E = E l«l /J WKJ- ( 27 ) 

3 n—l n—1 
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can be obtained from vectors \cq n )n ', n = 1, . . . , k through the following equations: 



ft=ft- 2 J n = l,...,j-U + (28) 



The proof of this Theorem, as well as the proof of the Corollary 2 below, are given in [16,18]. 
Corollary 1: Let the Lagrange multiplier vector |A fc ) M satisfying (15) be given. Then, the Lagrange 
multiplier vector \X k ^) IJl giving rise to the orthogonal projector onto the reduced subspace Vfc/ Qi . is 
obtained from the previous |A fe ) M as follows: 

M\ k '% = Mx k ), - ^in#^ - (29) 



The proof trivially stems from (15) using (28)in (27), since P v \f°) ll = £*=i \a ln )^(l n \ \ k/j )^ 
implies M <af n /j '|/<% = M\ k/j )» □ 

Corollary 2: The following relation between ||A^|/°)mII an d llA^. /a holds: 



W P V k/ «,I/°)a»II - \\ P V k \f°) p\\ - 't U ~k\ M2 ■ ( 30 ) 

Corollary 1 gives us a prescription to modify the Lagrange multipliers characterising a /^-parameters 
distribution, if one of such multipliers is to be removed. Nevertheless, still the question has to be 
addressed as to how to choose the Lagrange multiplier to be disregarded. Corollary 2 suggests how 
the selection can be made optimal. The following proposition is in order. 

Proposition 1: Let the Lagrange multipliers ^(/„|A fc ) M ; n = 1, . . . , k and n{l n \^ k ^) n ', n — 1, • • • , j — 
l,j + 1, . . . , k be obtained from (15) and (29) respectively. The Lagrange multiplier t i{lj\X k ) tt to be 
removed for minimising the norm of the residual error |A) M = Pv k \f°)^ — Pv k/a \f°)n is the one 
yielding a minimum value of the quantities 

; j = i,...m. (31) 



/' 


.(h\ 


A fc ) A 


12 
I \ 




IK>mII 


2 



Proof: Since on the one hand Py k Py k , = Pv k/a Pv k = Pv k/a and on the oder hand orthogonal 
projectors are idempotent we have: 

\\ p Vk\f°)ti ~ Pv k/ai . \f°)v\\ 2 = ix{f \Pv k \f°)n ~ ii{f°\Pv k/a \f°)fi = \\Pv k \f°)ti\\ 2 ~ \ \ p v k/ai I/°)mI| 2 - 

(32) 

Making use of (30), we further have 
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\\ p v k \f°)^ - Pv k/ai ,\f°)n\\ - m-fcv • (33) 

lll^/Mll 

It follows then that \\Pv k \f°)n — Pv k/ai \f°}^\\ 2 * s minimum if th^ft its is minimum □ 
Successive applications of criterion (31) lead to an algorithm for recursive backward approximations 
of the distribution. Indeed, let us assume that at the first iteration we eliminate the jth-constraint 
yielding a minimum of (31). We then construct the new reciprocal vectors (28) and the corresponding 
new Lagrange multipliers as prescribed in (29). The process is to be stopped if the approximated 
distribution fails to predict the observed data within the required margin. 

D. Numerical example 

We illustrate here an strategy consisting of the following steps: i)We use the data independent 
selection criterion for discriminating independent constraints. ii)We apply the data dependent selec- 
tion criterion on the previously selected indices. iii)The number of Lagrange multipliers obtained at 
step ii) is reduced and the remaining multipliers re-computed. 
We consider the example described below. 

The physical model yielding the matrix elements f^ n is given by the Lorentzian decays: 

fin = 7TT ; i = l,...,700 ; n = 1,..., 450. (34) 

J ' 1 + 0.01 100 -n) 2 ' ' ' ' ' ' v ; 

We construct 700 vectors \a n )^ ; n — 1, . . . , 700 as prescribed in (16) and select indices corresponding 
to the linearly independent vectors by the above descried technique for eliminating redundancy. Out 
of the redundant set of 700 vectors we found 100 linearly independent ones, up to a good precision, 
which is assessed by the biorthogonality quality of the corresponding basis and its reciprocal (dual). 
The experimental measures were generated considering that the distribution characterising the phys- 
ical system is the sum of 5 Gaussian functions represented by the continuous line of Figure 2. Each 
data was distorted by a random error of variance of corresponding to 10% of the data value. A 
realization of these data is shown in Figure 3. The inversion problem in this example is much more 
stable than the one of the previous example so that the results do not vary much by weighting the 
data. Hence in order to illustrate this strategy we use an uniform measure in all the involved proce- 
dures. Out of the pre-selected linearly independent vectors, by using the data dependent strategy, we 
selected between 8 and 12 (depending on the particular realization of the data) to be able to predict 
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the 700 pieces of data within the uncertainty up to which the data were generate, i.e., we require that 
111/*% — |/°)mI| 2 < lll 6 )^!! 2 where |e)^ is a vector of components 6j = tvi where, in general, t is real 
number in the interval [1 , 3]. In this case we first set t = 1.1 The approximation of the corresponding 
distribution is depicted by the dotted lines of Figure 2a (for 5 different realization of the data). We 
then increased the value of t up to t = 2 and applied the proposed strategy for reducing Lagrange 
multipliers. In spite of the fact that the number of parameters was significantly reduced, (only 5 
were kept) as it can be seen in Figure 2b the distribution is still a good approximation of the original 
one. The inference to the the data by this distribution is also of great quality. As shown in Figure 
4 the predicted date are really close to the noiseless ones. Notice that, by recourse to our approach, 
we are able to de-noise and compress 700 data by using only 5 Lagrange Multiplies. 

IV. CONCLUSIONS 

In this paper we have considered the problem of constructing the q — 1/2 MaxEnt distribution 
from redundant and noisy data. A previously developed approach has been generalised here in or- 
der to be able to incorporate, in a straightforward manner, information on the data errors. The 
advantage of this generalised approach, when dealing with very noisy data, has been illustrated by 
a numerical simulation. 

Additionally, a new strategy for selecting relevant constraints has been advanced. The corresponding 
implementation consists of two different steps. The first step is independent of the actual data, as 
it operates by discriminating independent equations on the basis of the physical model. The data 
are used, a posteriori, to reduce further the number of constraints. The latter process is carried out 
through a forward and backward procedure as follows: First the selection is made starting from an 
initial constraint and incorporating others, one by one, till the observed data are predicted within a 
predetermined precision. Afterwards, the number of parameters of the distribution is reduced further 
by applying a backward selection criterion for eliminating some of the Lagrange multipliers and recal- 
culating the remaining ones. It should be stressed that the combination of the forward and backward 
procedures is not, in general, equivalent to stopping the forward approach at a corresponding earlier 
stage. The irreversibility of the process is a consequence of the fact that, due to the complexity of the 
problem, the implementation of a selection criterion aiming at global optimisation is not possible. 
The strategies we have presented here are only optimal at each operational step. Hence, they do not 
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generate reversible procedures. 

Considering the complexity of the mathematical problem which is posed by the aim of constructing, 
in an optimal way, the q — 1/2 MaxEnt distribution from redundant and noisy constraints, we believe 
that the well founded suboptimal strategies we have employed here should be of utility in a broad 
range of situations. 



ACKNOWLEDGEMENTS 

Support from EPSRC (GR/R86355/01) is acknowledged. 



16 



FIGURES 




Figure, la: The theoretical distribution is represented by the solid line. Each dotted line corresponds 
to the approximation we obtain by using an uniform measure (// = 1) for 2 different realization of 
the data. 
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Figure lb: The theoretical distribution is represented by the solid line. Each dotted line corresponds 
to the approximation we obtain (for 5 different realization of the experiment) by weighting each data 
with a measure ^ = a^ 2 . 
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Figure. 2a: The theoretical distribution is represented by the solid line. The dotted lines correspond 
to the approximation we obtain for 5 different realisations of the data. Each line is constructed by 
iteratively selecting constraints out of the reduced set obtained by the data independent technique. 
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Figure. 2b: The theoretical distribution is represented by the solid line. Each dotted line represents 
the approximation of the corresponding one in Figure 2a, after the elimination of some parameters. 
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Figure 3: The simulated data after distortion by random noise. 
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Figure 4: The theoretical data are represented by the continuous line. The dotted line corresponds 
of the predictions obtained by means of the approximation of Figure 2b. 
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